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Data Driven Loops I 

Part I: Introduction 

II The HIBOL Language: A Britf Introduction 

The notion of the data driven loop arises in connection with our work in the Very High 
Level Language HIBOL and the automatic programming system (ProtoSystem I) that supports it. 
Although the concept is of general interest outside of VHLL's and automatic programming, we 
find it profitable to use HIBOL as a vehicle for our discussion and a means of narrowing the 
scope of our discussion. Therefore we first present a brief description of the domain which 
HIBOL treats. 

III Flows 

The HIBOL language concerns a restricted but significant subset of all data processing 
applications: batch oriented systems involving the repetitive processing of indexed records from 
data files. It provides a concise and powerful way of dealing with data aggregates: HIBOL has a 
single data type, the flow. This construct is a (possibly named) data aggregate and represents a 
collection of uniform records that are individually and uniquely indexed by a multi-component 
index. The components of a flow's index are called *<yi and the set of an index's keys is called its 
key-tuple* Each record has a single data field (datum) in addition to ihe index information. 
(Real-world data aggregates, such as files, with mora man one datum per logical record are 
abstracted in HIBOL as separate flows, one for each data field.) 



1 This term is historical. A more expressive term would be "key set", but that has historically 
been used to indicate the universe from which a key may take its values. 
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112 Flow Expressions 

Flow expressions can be formed through the application of arithmetic operators such as V 
or V to flows. The meaning of such an application to two flows is that the operation is applied to 
the data of corresponding records (those with matching indices) of the argument flows. The result 
is a new flow, having a record for each matched pair for which the operation was performed. The 
index value of such a record is identical to that of the matched pair, and the datum value is the 
result of the operation performed on the data of the pair. This concept is generalized to an 
arbitrary number of flow arguments. 

Flow expressions can also be constructed using a conditional operator (similar to a "CASE" 
statement) which evaluates logical expressions in terms of corresponding flow records in order to 
select and then compute an expression as the individual records of the flows are processed. The 
logical expressions are constructed using the arithmetic comparison operators V, "-", and "<". In 
addition the PRESENT operator may be used to test the presence of a record in a flow for a given 
value of the index of that flow. These may be composed using the logical connectives "AND", W 
and Tin* 

Finally, there is a class of reduction operators permitted on flows and flow expressions. The 
function of such an operator is to reduce a flow with an irkcy index to one with an in-key index. 
where m < n, and the key-tuple of the m-key index is a subset of the key-tuple of the n-key index. 
All records of the argument flow that cor respond to a smgfe record of the rest* form a set to which 
a reduction operator (eg. "maximum", "sum") can be applied to obtain a single value. 
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1.1,3 Flow Equations 

Relationships between flows are are expressed by flameq uarfonrof the form: 

<f lou-nane> IS <f low-express ion> 
where <f lou-name> is a named flow and <f lou-express ion> is a flow expression in terms of 
named flows. The right- and left-hand sides must have identical indices. 

I.L4 Example , 

Consider a chain of stores whose items are supplied from a central warehouse. The collection 
of store orders for item restocking on a given day can be thought of as a flow called, say, 
CURRENTORDER. A record of that flow contains the quantity ordered by a particular store of a 
particular item. Each record has as its datum the quantity ordered and a 2-component index 
identifying the store making the order and the item ordered (the keys of the index are a store-id 
and an item-id). Let BACKOROER be the name of a flow (of similar structure) representing the 
collection of (quantities of) previous orders that could either hot be filled or fined only partially. 
The HI BOL statement 

DEMAND IS CURRENTOROER + BACKORDEfl 
describes a new flow DEMAND representing the total demand of each item by each store. That Is, 
each record in DEMAND contains a 2-component (item-id. store-id) index identifying its datum which 
is the sum of the data for the Mine item and store in the CURRENTOROER and 6ACK0R0ER flows. 

The HIBOL statement 

ITEMDEMAND IS THE SUM OF OEMAND FOR EACH ITEM- ID 
illustrates the use of the reduction operator SUM. It describes a new flow I TEHOEHAND representing 



the total demand of each item from off stores. That is. each of its records has a stogie-component 
index (item-id) id^Kying a particular an* and «s dtMrnh die total quantity in demand sonwned 
across aR stores m the chain. 

115 Additional Information 

The computational part of a data processing system can be described by giving a full set of 
ffaw equations of the tfpe shown above. To complete the system's description additional data and 
timing information mast be given: 

- for each flew, the components of its index, die type of its data vatoe, and the 
periodicity with which it is com pu te d 

- for each key its type 

- for each period its time relation to other periods 

12 juration Sets and Exftoit HIBOL 

A flow expression, as explained above, represents a set of records obtained by the record by- 
record application of a formula to the records of the flows that appear as terms in the expression. 
In this paper we shall be interested in exactly for which index values (and thus records) the 
indicated formula is applied. The set of these Index values is termed the titration set* 

The HIBOL language is rather informal about specifying Iteration sets. It contains 
abundant provisions (through «he use of defaults) for impfirit semantics based on the presence or 
absence of records in the flows appearing in flow expressions. For exampfe, the HIBOL flow 
expression 

CVRPBtWPOm ♦ BAOCWOER 



2 After Baron 01 
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describes a flow that has a record for each index Value for which either tURRENTOROER or 

BACKQROER (or both) has a record: 

if both flows have a record for a given index value, the resultant flow has a record with the 
same index value, whose datum is the sum of those of the corresponding records in the two 

;- flows; >;"■:.:■:;/ ■■■■ -'*_' -■: -'■ .'V .■■":. ."X* ■■ ^ ■ V ! '»-V •""• '• ;: 

if only one flow has a record for a given index value; the mutant flow has a record with the 
same index value and the same datum value; 

otherwise there is no record in the resultant flow. 
One way of looking at the semantics of addition in HIBOL, then, is to convene that the operation 
+ is performed if and only if at feast one of its operands is present and that each missing operand 
is treated as if it were the additive identity {0). 

Although such conventions are convenient in writing HIBOL, for the sakes of clarity and 
rigor, we require fully explicit iteration set specifications. Such can be obtained through the 
thorough use of the HIBOL primitives IF and PRESENT. Thus, the fully explicit form of the 
above HIBOL flow expression would be 

CURRENTOROER + BAOCOROER IF CURRENTOROER PRESENT 

AND BAOCOROER PRESENT 

ELSE CURRENTOROER IF CURRENTrjROER PRESENT 

ELSE BACKOROER IF BACKOROER PRESENT r 

Here the Index values for which the flow expression's formula is to be applied have been made 

explicit by restructuring it as a three-clause conditional expression in terms of three sub- 

expressions, each of whose iteration sets is specified by an associated condition on the presence of 

records in the flows involved. This is a legal HIBOL flow expression, although in view of the 

existing conventions it is overspecified (redundant) For our purposes we wtH distinguish a 
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description is declarative in nature: it describes the relationships among the flows. An Implemented 
data processing system is procedural in nature: it mim describe in detail how the flows are 
computed. The flow equations must be reinterpreted as bask computation steps (with an output 
flow and one or more flows as inputs) and constraints on the order in which these computations 
can be performed (the computation producing a flow must be^perioimed before any computations 
using that flow) must be made explicit. 
Design:* 

The implementation will make use of files of data to be processed by Job steps which will In 
turn create other files. Each file will contain the mformation represented by one or more flows; 
each job step will perform the processing to satisfy one or more flow equations. The design of each 
file (information contained, organization, storage devke, record sort order) and of each Job step 
(equations implemented, loop structure, accessing methods used) should be made in such a way as 
to minimize some overall cost measure (e.g. dolbrs-arKHents* cost, time used, number of secondary 
storage I/O events) for the execution of the data processing system. Typically this requires dynamic 
(behavioral) analysis of tentative design configurations. 
Code Genera tion: 

■■'■■ , '■ ' ' l| «iH T . H i i.il &J II I II ■ ' . ; ■ 

The system's design must be coded in a supported high-level language so that it can be 
executed. 

1.4 Data Driven Loops 

Each flow equation represents a computation whose Implementation is essentially iterative in 



4 In ProtoSystem I the design process is performed by the Optimising Designer module. 
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any set of values for a particular index an index «f and we distinguish two special kinds of index 
sets: 6 n^' ■- -r- ■ ■ ■■ ■ 

The set of index values for which a flow F contains a record is called the Index set of F 
(denoted IS(F)). 

The set of index values for whkh an input flow F r containt a record that Wilt be used in 
generating a record of the output How F the critical index sei of f) with respect to F (denoted 
CJSflF,)). 
These two should not be confused. CIS F (F,> for some ftow F will often be a proper subset of 

IS(F,). 7 ■■:.■■■.:■«>:■■ 

The problem we face it that of finding some way of enumeraMng the critical Index sets of 
each input so that loop can be properly driven * It m generally impractical to use the set of all 
possible (legal) index values for which an input might *»** * record: For one thing this set may 
be unbounded. Even if it is finite and enumerable, it will often be much larger than the critical 
index set and thus grossly inefficient. In the DEMAND flow equation example given above, for 
instance, the critical index set of the input flow CURf€NTfJRTJER is likely to be orders of magnitude 
smaller than its maximum possible size (the case where every store has orders for every item). 

A much more efficient way of enumerating a set of index values that is assured to cover the 
critical index sets of the inputs is to use the union of the index sets of the input flows. This will 
work because a record of the output can be produced only if there is some input flow in which that 



6 Unfortunately, this terminology is at variance with that used by Baron in his- thesis IIJL 
Baron uses the term "critical index set" to mean what we caR the "index set". 

7 On no account, of course, can it be other than a proper or improper subset of I S (F,) . 

8 This statement is somewhat oversimplified, but it will suffice for now. A fully precise 
statement of the problem is given by the Fundamental Driving Constraint Hi Part IV. 
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Data Driven Loops II 

Part II: Structure of Data Driven Loops 
Before a general treatment of data driven loops tan be developed It is necessary to examine 
the structures of the loops encountered in the HI BOL jyttern. Wc begin by preserrt»ng a taxonomy 
of computation types and their corresponding loop im p l ementations. 

//./ Loop Terminology 

Before discussing loop structures it is useful to establish some terminology. By the term loop 
we mean a control construct which somehow enumerates a set of values for a loop-index arid which 
performs a fixed sequence of statements (its body), once for each value of the loop -index. A loop 
may contain one or more loops within its body. The inner loops are said to be nested within the 
outer (enclosing) loop and the structure as a whole is ailed i nested loop structure. Each enclosure 
defines a different level of the nested loop structure. The degenerate case of a nested loop structure, 
where there is no loop in the body of the outer loop, is called a singtt-level lotp, since there is only 
one loop level. 

A totally nested loop is a nested loop structure whose comp o ne nt loops are totally ordered 
under enclosure (i.e. for any two loops L, and L 2 either L, is inside L 2 or L 2 Is inside L,). 

It. 2 Kinds of Computations and Their Loops 

Each run (computation, job step, program) in the implementation produced for a HIBOL 
description of a data processing system is essentially a loop that iterates over the records of its input 
files to generate records of its output file(s). The structure of this loop depends on the nature of the 
computation being performed. We will begin with computations that directly implement single 
HIBOL flow equations of various types. Then we will consider computations that implement more 
than one flow equation (aggregated computations) simultaneously. 
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for each (employee- id) fro* HOURS 

get HOURS (employee- idt 

PAY(employee-fcl) - 

if defined (HOURS (employee- id)] 

and not (HOURS (employee- id) > 48) 

then HOURS (employee- id) * 3.8 

else if defined (HOURS (employee- id)] 

then 128.8 + (HOURS (employee- id) - 48) « 4.5 

else undefined 

if defined (PAY (employee- id)] 
then write PAY (employee- id) 

end 

The for -end construct represents the basic iteration over values of the index employee- id. It 

specifies that the values for the index are obtained from the HOURS flow. For each index value, the 

corresponding record of HOURS is read, the corresponding record of PAY is generated, and (if 

generation was successful) that record is written out. Notice that the PAY calculation is a direct 

translation from the H1BOL flow equation. 

For reasons of exposition the loop implementation presented here is of the most general form. 

An actual implementation would incorporate various efficiency enhancing improvements. 

Nevertheless, we shall contitioe to use such forms to show explicitly where I/O and testing occur 

conceptually. 



9 For instance, since the for has to read the next record of the driver to get the current index 
value, the get couW be omittfd. Furthermore, the defined tests in the PAY calculation could be 
omitted since they are testing the presence of record which must be present. Finally, in this 
computation, the check before output could also be omitted. 



H 
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11-25 Matching Computations 

A matching computation compotes a non-reduction flow expression Involving tw or more 
flows. Thus it is simitar to a simple computation, but instead of operating on a single record of a 
single input flow to produce an output record, it operates on a set of; corresponding records, one 
from each input flow. Correspondence is established by common index values. The name 
"matching computations" derives from the necessity of matching up the records of the inputs by 
index values before they can be operated on. 

Two sub-classes of matching computations can be distinguished depending on whether all of 
the inputs have indices with identical key-tuples or not. 

H.2.2.1 Expressions Involving Flows with a Uniform Index 

Consider the a pay calculation similar to that given above, but where employees are paid 
various hourly rates. Let RATE be a flow, indexed by (em ploy ee id ), each of whose records has as 
its datum the hourly pay rate for the employee indicated by its index value. The pay calculation 
then becomes 

PAY IS HOURS * RATE IF HOURS PRESENT 

AND «AK PRESENT 
AW MOT HOURS > At 

ELSE 

RATE « 48 ♦ 

ffCURS - 491 * 1.5 * RATE fF HOURS PRESENT 



HOWS and RATE have identical indices, each consisting of the single key "e mplo y ee id ". TKe loop 
tint implements such a computation has a single fcvel 

Because a record of the output is generated only tf mete is a record in the HOURS file, that 
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file alone is sufficient to drive the loop, (Alternative^, by similar reasoning, the RATE file could be 
used to drive the loop.) This is the simplest case of a matching computation because only one 
input is needed to drive the loop (The computation of the flow S above is also of this type.) On 
each iteration the next record of the HOURS tile is read, the corresponding RATE record is fetched, 
and the computation of gross pay performed. 

This loop is represented in the SEAL language thus: 
for each tempi oyee- id! fro* HOURS 

get HOURS (employee- id) 

get RATE (employee- id) 

PAY (employee- id) « 

if defined (HOURS (employee- id)] 
and defined [RATE (employee- id)] 
and noHHOURS(employee-id) > 48) 

then H0URS( employee- id) * RATE (employee-id) 

else if defined (HOURS (employee- id)] 

and defined [RATE (employee- id)) 

then RATE (employee- id) * 48 + 

(HOURS (employee) - 48) * RATE (employee- id) * 1.5 

else undefined 

i f de f i ned (PAY (emp I oyee- i dl J 
then write PAY (emp I oyee -id) 

end 
Again, the de f i ned checks on the driver, HOURS, are superfluous. But those on RATE are necessary 
(to determine whether the corresponding get was successful) and the defined check on PAY Is 
necessary (so that a record is written if and only if a datum was generated). 

Now consider the HIBOL flow equation for the 0EHAND flow given above 
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These details are implicit in the SEAL representation of the loop which is simply: 

for each (item-id, store-id) from CURtiENTGfSEft, BAOCflRDER 

get CURRENTQRDER(He»-id, store- id) 

get BACKORPEBHtew-Vd, stort-id) 

DEMAND I ite»~id, store-id) - ... 

if def ined (DEMAND <i ten- id, store-id)} 
then write DEMAND (i ten-id, store-id) 

end 

IL2.2.2 General Discussion of Expressions Involving Flows with Mixed Indices 

The treatment of mixed-index flow expressions in this paper wilt be restricted to those that 
are legal in HIBOL. The restrictions that HIBOL imposes are made for good reasons. A brief 
discussion of the various conceivable types of mixed-index flow expressions is presented here in 
order to show the motivation behind these restrictions. 

The various cases where the flows in a flow expression have mixed indices (i.e. their indices 
have different key-tuples) can be distinguished by the set interrelationships among the key-tuples. 

Consider the case where flows have disjoint key-tuples (e.g. (w, x) and (y, z)). 
Correspondence among records of such flows is meaningless, so we do not allow them to appear in 
the same flow expression. 

Now consider the more general case where there is intersection among index key-tuples, but 
the union of their pair-wise intersections is not identical to their (simple) union. In this case 
correspondence is always ambiguous. For example, consider the two flows: A with index (x, y) and 
B with index (y, z). Suppose that there are records in A for the particular index values (x t , y,) and 
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(x^ y,) and that there are words on B lor index values (y,, r# (y,. ij) and Cf |. «j£ Which of A's 
records correspond to. which of BY records?'* 

For correspondence to be meaningful and tm a mb i guo u s it most he the case that the union of 
the pair-wise intersections of the key-tuples of the indices involved Is identical to their union. This 
is always the case when there exists an index among fhe flews involved whose key tuple is a 
superset of alt the key-tuples of the other flows. 

To be sure, there are other ways of satisfying the condition of the preceding paragraph. 
These involve conjunctions of three or more indices. Consider, for instance, the three flows: A with 
index (x, y); B with index <y, zh and C with index (x. z). C or re spo n ding triplets are all unique and 
unambiguous, of the form (x* y^y* h ),{x r *}. For rhe sake of itfoplcfty. however, this case is 
prohibited in HIBOL. 

IL2.2J Mixed-Index Flow Expressions Allowed in HIBOL 

It is possible in HIBOL to apply operators to two or more flows having different indices as 
long as each index is a sub-index of the index of some unique flow involved (i.e as long as the 
key-tuple of each index is a subset of the key-tuple of the index of the unique flow). Clearly, the 
index of this unique flow is identical to the index of the flow expression as a whole. HIBOL 
allows a mixed-index flow expression only if its computation can be driven by the set of those flows 
involved having indices identical to that of the flow expression. 



12 Of course, we could aftow all pairs to match (in Cartesian product fashion) so that the 
expression A + B would represent the six possible combinations of additions for these 5 index 
values; but this would change (extend) the semantics of HIBOL. 
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For example, suppose we want to cakuiate the extemled piloct 13 of the current store Orders 

(the flow CURRENTORDER) in our store chawi exampk Let PTOEE be a flow Indexed by (Item-id), 

each of whose records has as its datum the per-item price associated with rt**em s identified by its 

index. The flow equation for EXTENDEOPRICE, indexed by (item-id, store-id) would be expressed in 

HIBOLthus: f R f 

EXTENDEDPRICE IS CURRENTOROER * PRICE IF - CURREHfOROER PRESENT 

AND PRICE PRESENT 

The intent here is: for every record in CURRENTOROER find the CMTesponding^ecartf in PRICE and, 
if the latter is present, multiply their respective data to calculate the datum of a corresponding 
record in EXTENDEDPRICE. rl |ipttee thatbeoMMe P*U€£ ,«Bd:€Up8a«0«»t*awre different indices 
((item-id) and (item-id, store-idX respectively) the rtotion of c ot r e sp cn deh cc must be extended in if 
natural way /rom pure identity of index values. We QMKvene that for a-|mtsattef vatae of item-id 
the index (item-id) matches any index (item-id, stored) wnb the Same vafcit d£4tem-ld, regardless 
of the value of store-id. This augmented definition of correspondence is extendedto the general 
ca se, where the kef-tuple of j»e index is* a subset of tb*t l^-t«ple of ihbCNer. Thai is, for given 
values of k,, .... k m the index (It,, .... le*)4s said tomatch wrf mrtance^ef an irtdex (k,, .... k,,, k,^,, .... 
k„) with the same values of k|,..., k^ regardless of the values Of k,^!, .„,k fr 

Since a set of input flows, each with index identical to the flow expression's, can be used to 
drive a mixed-index matching computation, its implementation USfcrHtor to that for a uniform- 
index matching computation: the sorted drivers are read in Hktil way as to en u mer a te the critical 
index sets of all of the input flows; the resulting index values are used ftffetdt records front the rest 
of the inputs (including aH those whose indices are sub-indices of the flow expression's index). 



13 The extended price of* quantity ordered is the product of the quantity and the per-item 
price. 



- -<w~ * wt *T,S*fit' ~-t-----» ; 



50 



get <input> 7 



get <input> R 



nsfi\ JO<iiK 



. Pij,B^3 33IR3 CM 

if defined {output <<intfex>t} 



fli ttfi imifi ^iliniMifiiriiiiMiiiiiMiB M^anil* iwn(e i& Uiati W* 



■sdj if 



£3¥? 



the?; 






•Ml, 



: 9ris,? ?s Sffsbaoo«i"TC3 Ks-fwjfiifrftefe h55*vwrt*i»s fir- 



fccBnn 



:':?-? '='. j ?i "to titjliRv srtl 1c 



iwords of both CUWHUHUfrj ^ yy^fcpttwlhj rteilrilrf 



^H||M^£ 



tor* 



he; .J ^u> ;-;t?Isv s*br? ^Rifiy?'«i wO ;ewoft ?u*i«"; ^r? H.- 



Itftftl 



5 **<: ^5i>n! 






Data Driven Loops 21 

Basically, the outer loop chooses a value of thejMMWIex (MertViel and fetches the corresponding 
PRICE record. Then it performs the inner loop. Within the inner loop the valu^df th* item-id key 
is held constant. All corresponding record* of G0RI«Nf»^R art read ; and the computation 
described in the flow equation is performed; using the data Of these records together with the datum 
of the PRICE record fetched in the outer loop ^Tt#? results '»re used to bufld and output the 
corresponding records of EXTBNOEtfWEE. This process is repealed ufttfltfo* fldws are exhausted. 

In detail the implementation is as follows. Before either loop is entered a record of 
CURRENTORDER is read. The outer loop uses this record to obtain the first value of the sub-index 
(item-id) and fetches the corresponding record from PRICE. Then it performs the inner loop. The 
inner loop uses the current record of CURRENTOROER and continaes to read records sequentially 
from CURRENTORDER until the sub-index is observed to change or an end-of-file condition occurs. 
When either of these conditions occurs, it writs to the outer loop. If an eof has occurred, the outer 
loop exjts. Otherwise it iterates, using the sub-index vafcw of the current CURRBfTOROER record as 
the new value to be held constant in the toner loop, fetching the corresponding PRICE record and 
performing the inner loop again. ? 

The corresponding SEAL code is: 
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for each fifea-idl fre» CURREMfQRDER 

get PRICE litem- id) 

for each (store- id) froa OJRRBH11HJUI<itea>-»d» 

get CURREMI€ROERnte«-«d. stare-id* 

EXIEHJE0PB4CEU te»- id, store- «tt - 

if deHnediaf*€t«0ROERI»t*«-id, •1ere-4dlJ 
and definedfPRIC£<iteu-idlI 

thenCUHRENTOHDERfiteo-id, store- id) * PRICE titoM- Id) 

else undefined 

if definerfOTENOEOPRICEUtea-id, store-id)] 
then urite EXTQ«EGPRHime*-iti, store- id) 

Notice that the outer Joop is driven bf CURRENTORBER (the whete flow), but that the inner loop is 
driven by CURRENJOROER(itee-id) (t*w sob-flow ofCURHENTOROERcomirtiftg of just those records 
whose indices correspond to the value bf the s ub in dex (tamid) fixed bf the enter loop). What 
this, means is that for the outer tooplbe next value of the juVmdex {item id) wfH be taken from 
the next record of the CURRENTORDER flow. But for the tamer loop the best value for the sub-Index 
(store-id) wiR be taken from the next record of the sub-flow of GURHEK&RBER cerresportding to the 
current value of (item-id): if there are no further records in CURRENTORDER for this fixed value of 
(Hem-id) this will be treated just like an end-of-file condition and the iteration of the inner loop 
will terminate. Thus the inner loop is driven by a succession of sob-flows, one for each iteration of 
the outer loop. 

This nested-loop implementation scheme is easily extended to 3 or more loop levels when 
appropriate sorting constraints hold among the flows involved. For exanyle, suppose that there 
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are 3 flows involved: A with index (k, t k 2 , k 3 ); B with index (k,, k 2 ); and C with index (k,). And 
suppose further that B is sorted by k, and that A is sorted first by k, and, iMilri segments 
corresponding to a fixed value of *,, the records of A are furrheV ^sorted 1 bys^. 4 tf»n the flow 
equation can be implemented using a nested loop structure involving ^ teopi%nr#rmost loop, 
middle loop and outermost loop). The outermost loop cbooseV a Vakie ftk the key It, to be held 
constant within the middle loop (and perforce in the innermost top, wmch is contained in the 
middle loop). It also fetches the corresponding record of C for u* within the contained loops. 
Then it executes the middle loop, which, in turn, choose a virtue for <he key k 2 tdt>e held constant 
within the inner loop. The middle loop also fetches the corresponding record of B fof use within 
the innermost loop. Then it executes the innermost loop. In the innermost loop the values of the 
keys k, and k 2 are held constant. The innermost loop reads all corresponding records of A, iisft% 
their data and those of the already read records to perform the calculations described in the flow 
equation and to build and output the records of the output flow. When the innermost loop has 
read and processed all records of A corresponding to the fixed values of k, and k 2 , it exits to the 
middle loop, which chooses a new value for k 2 and iterates. When the middle loop has exhausted 
all possibilities for the value of k, fixed in it, it returns to the outermost loop, which chooses a new 
value of k, and iterates. This loop structure expressed in the SEAL language looks like 
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treated as a single flow. 

Conceptually, the argument flow is partitioned into subsets (sub-flows) b/ an equivalence 
relation defined on the sob-index (a key or teysf indicated in the W EACH clause; then the 
reduction operator is applied to the mernberi of each sUbset to generate the value of the datum of 
the output record corresponding to that subset For instancerin the first example given above the 
DEMAND flow is conceptually partitioned mm record subsets by item^tt. Thus, all records in DEMAND 
whose index contains the value item-M, for the item-id key are in one subset, all records for item-id 
- item-id 2 ate ir# another, and so forth {empty subsets are ignored) The datum for the record in 
ITEMOEMANO with mdex - ftterrtfdj) ft catenated by summing arof the dau in the records in the 
subset corresponding to item-id - item-id H 

Conceptually, the implementing iteration for a simple reduction expression in a single flow 
consists of two loops, one nested inside the other! The Inner loop implements the application of the 
indicated reduction operation to a subset of the input's records: Within this loop the value of the 
sub-index defining the subset is held constant. Returning tb th? $JM OF DEMAND example, the 
inner loop implements the summation of the data of the records of each subset of DEMAND. That is, 
the inner loop is performed for each value of item-id, for which there are records in DEMAND. 
Within the inner loop the particular value of the key item-id is held constant, all records of DEMAND 
corresponding to that key value are fetched and their data are summed. 

The outer loop performs clerical work. It chooses a value the subletting sub-index (eg. a 
value of item-id), executes the inner loop (which fetches records of the input corresponding to the 
chosen sub-index and, for example, adds them to the accumulator), and when the inner loop is 
finished, it uses the resulting value as the datum of the output record corresponding to the chosen 
sub-index, and writes that record out. 
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It may at first seem unnecessarily baroque to initialize the accumulator sum to "undefined" in the 
outer loop, test it in the inner loop for definedneSs and then inWalia? it4f undefined. In this 
simple example we could just initialize it to in the outer bop and not bother with the definedness 
checks. We have chosen the former course for two reasons. First, we wish to make explicit the 
conditions under which the sum (and thus a record of the output 1 1SBEMAND) ft defined for a 
given value of the lay item-id. Second, a little thought whl show that for other reduction 
operations (vie MX and WN) imtiaiizarton of the accumetator must (at least conceptually) be 
postponed until the inner loop where the initializing value is obtained by the first got: Moreover, 
in general, when computations are aggregated (see betow) and more thin one activity is performed 
in the inner loop, it is then possibk (if tome driver besides OEMANO is used) that for some values of 
item-id no sum is calculated in the inner loop and thus euf» Is undefined on exit from that loop. 

If the input flow is not sorted as above, the computation for a reduction operation becomes 
somewhat more complex, One possibility is to cr«ate ar«l mamtatn separate accumulators for eadi 
value of the sub-index value occurring in the input flow. Since the number of accumulators cannot 
be known a , pfktri <i£. at>«oj»pilMiniei ^ 

execution of the computation). In PL/I. for example, the following (roughly outlined) scheme might 
be used: v^:-^ ., .'.---v ''-•■r : - :?.»■•: ■-'■:■■' ■■* ." ... ^ .';■*:■■ ■■ 

Declare an accumulator array to have CONTROLLED storage. 

Make a p re-pass through the input flow to count the number of different sub-index 
values occurring. . ... : ? - -?>>-.., - : .j : n t ::.v 

Execu te an ALLOCATE statement to define tb»»iz# of the array. 
Make a second pass over the input flow to perform the accumulation. 
Write all accumulated values out to the output flow. 
In this scheme there are two separate loops instead of a totally nested bop structure. 

Alternatively, a nested bop, multi-pass scheme could be implemented. The outer bop would 
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H-3.1 Formal Representation of Nested Loop Structures 

We have seen that the basic control structure used in implementing a computation is the 
totally nested loop. Associated with each loop in the nesting is a set of keys that it will fix and 
which will remain constant in the loops it contains. It is easy to see that this constraint means that 
the set of keys fixed within any loop is necessarily a (proper) superset of the set of keys fixed within 
any of its enclosing loops. Thus, the set of keys fixed within a loop is sufficient to determine its 
level in the nesting. 

Now notice that the body of every loop (except the innermost one) contains exactly one top- 
level loop; thus, the body is naturally divided into three parts 

the prolog-those actions performed before the enclosed loop 
the enclosed loop ,t - » 

the epifog-those actions performed after the enclosed loop. 

Conceptually, then, a totally nested loop can be represented as a list of loop descriptions, one 

for each of the component loops. Each such description would consist of a level identifier 

(indicating at which level of nesting it occurs) and the prolog and the epilog. However, during the 

design stage, while implementations are being developed and, in particular, when computation 

aggregations are being considered, it is useful to distinguish 3 classes of actions within the body of 

a loop: 

Pro I og--those actions that must be performed before the enclosed loop 
Ep i I og--those actions that must be performed after the enclosed loop 
Genera I --those actions that could end up in either th* prolog or the epilog 

It is a Iso useful to separate I/O actions from the other actions. Thtfi, W represent each loop 

in the nesting as a structure of the following form. 1 ^ 



This representation, and the theory of computation aggregation associated with it are due 
largely to the work of R. C. Fleischer [2], who improved on the earlier work of R. V. Baron. 
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(Leve I , .■■■■■>'. 

(Inputs?, Prolog, Output 9p) 
( I nputsc. Genera U Outputs^} 
(InputS£, Epilog. Outputs^)) 

where 

Level indicates the depth of the loop in the nesting 
Inputsp are the files (necessarily) read in the Prolog section. 
I npu 1 9 C are the files (necessarily) read in the General section. 
I nput sj are the files (necessarily) read in the Epi log section. 

Outputsp are the outputs generated h» the Prolog section (po«*rf used in the enclosed loop 
or in the Epi log section) 

Outputs G are the outputs generated in the General section. 

Outputs E are the outputs generated in the Epi log section. 

UX2 Computation Implementation 

The implementation of a computation as a nested loop structure reduces to the problem of 
determining how many and which levels are to be in the totally nested loop and where the I/O and 
computations go. The answers to these questions are constrained by the forces of necessity and 
efficiency. 

tl.3.21 Level Position of VO and Catatbtiom 

The levels at which each input shooM be read, each output should be written and each 

ra i~.i»*i — u^..^ ^ rnflTrmrrt 1rr rtrTrimimi1 |jy rh , rnn^n^g gumtiimi 

inputs: Each input flow of a computation should be read at a loop level whose associated 
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key-tuple is identical to that of the flow's indexjand <*H this accounMhe totally nested loop for a 
computation must contain a loop corresponding to the index af each input flow). It cannot be read 
a higher level because at such a JevtjUbeskey information is incomplete. To read it at slower level 
would be inefficient, because it would cause^nneeessaiy rereads of the flow's records. 

°M!P utS: Similarly, eacl* output flowof a computation mu« be written at a loop krv*l who* 
associated key-tuple is identical to that of the ftw!f4nd«*. ; ft cannot he written at a higher level 
because of insufficient key information, and Jo output * at* tower tevd would caose multiple writes 
of the records. sj?, 

Calculations; A flow expression should alto fee calculated at a toop level whose associated 
key-tupfe is identical to that of the flow expreuion's index. Agam, the *** information at a higher 
level would be insufficient to calculate the expiesstoiMMid to perform it at a lower level woo« be 
redundant. Further economy can he realiied, however, m ,*. mixed-index f4ow expression if it 
?:ont * ln f a $ubex P r «»°n w »»ose associated index is * sub-index. oHbe flow expression as a whole; 
such a subexpression should be split off and calculated at Ju »pcwpr«U* <*Mgher> level. 

H.3.2.2 Position of I/O and Calculations Within Their Assigned Levels 

The placement of a read, write or calculation within a given loop level (i.e. in either the 
Prolog, Epi log or General section) should be done with a view toward imposing the minimum 
constraint on implementation. If done in this manner placement preserves the maximal flexibility 
in subsequent aggregation. For instance, if a calculation could go into either the Prolog or the 
Epi fog it should be placed in the General section, if instead it were arbitrarily placed in the 
Epi log this unnecessary constraint would preclude subsequent aggregations that would require it to 
be in the Prolog (loop merging in computation aggregation is discussed below). 
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PAY IS RATE * HOURS IF RATE PRESENT AND HOURS PRESENT 

Here, both inputs have the same index (employee-id) so there is only one loop: 

Level: (employee- id) 
Inputsp: empty 
Prolog: empty 

Output spempty 

Inputs G : (HOURS. RATEI 
Genera I: calculate PAY 
Output3 G rfPAYI 

Inputs^: empty 
Epilog: empty 

Outputs £ empty 

As explained above, everything is placed in the general sections* 

Now consider a simple reduction flow equation: 

ITEMOEflAND IS THE SUM OF DEMAND FOR EACH 1TEM-I0 

We have seen that the implemenution of such a flow equation wilt always have two loop levels: 

Loopl (outer topp> 

Level: (item- id) 

Inputsp: empty 

Prolog: initialize sum 

Outputepempty 

Inputs G : empty 

General leapt y .,>. . .-.-,-:,> 

OutputsQempty 

Inputs^: empty 
Epilog: empty 

Outputs E :UTEMDEMANDI 



r5£&H&i^££g 
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Loop I (outer loop) 

Level : ( i tern- id) 

Inputsp: (PRICE! 
Prolog: empty 
OutputspEmpty 

Inputs G : empty 
General : empty 
Outputs G empty 

Inputs E : empty 

Epi log: empty 

Outputs E empty 



Loop 2 (inner loop) 

Level: (item-id, store-id) 
Inputsp: empty 
Prolog: empty 

OutputspEmpty 



Inputs G : ICURRENTORDERI 

General calculate EXTENDEDPRICE 

Outputs G :IEXTENDEOPRICEl 

Inputs^: empty 
Epilog: empty 

Outputs E empty 
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If two computations have level compatible loops and if the, ordering constraints of the two 
loops can be mutually satisfied in a single totally nested loop, aggregation is possible. 

Hill Level Compatibility Between Loops 

It is easy to show that two loops are level compatible if and only if their level structure! are 
identical or empty levels (levels at which no actions are performed) can be inserted to make their 
level structures identical. Some examples of level compatible totally nested loops (TNL's) and the 
level structures of their aggregated results are; 16 

!oop I eve is levels in aggregate 



TNL, (K), (K.L) 

TNL 2 iK,U 

TNL, (K,L) 

TNL 2 (K,L,H) 



IKI, *,U 



<K,L), HC,L,fl) 



<KI. «.LI; flf.L.m 



TNL, IKI. (K.LI 

TNL 2 (K.LI, CK.L.fU 
It is interesting to note that when aggregation occurs loop levels are neither added nor deleted; that 
is, the set of loop levels in the aggregate is simply the union of the sets *£ loop levels Hi the 
component computations. 

Some examples of loops whose level structures are incompatible are 

loop levels 

TNL, (K) 
TNL 2 (L» 



tn this section the symbols K, L and fl denote different keys. 
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Tht 2 CLI, flCJJ 

TNL, ttO. OC.L1, PC.L.nj 

TNL 2 «C), <r,ro. (ic.L.rw 



in 1 2 Order Constraint Corapatibintr Betweef Loops • 

Consider the co mp n uti o n. for the follo wi n g two flow equations: 

i lEteEww rs we sor* of 1 tewnie for fjcr nfeK-io 

FRACTION IS DEMAND/ ITETOEriAND IF CJErtftnTrFHESBIT 

It would seem immanentty reviewable to aggregate these two c ow pdtati ons since they have a 

common input <DEriAND) and the output of the first is an input to the second. Yet .ihey cannot be 

aggregated into a totally nested loop! .Their Implementation descriptions reveal why. Recall that 

the description of the first is: 

Loop I (outer loop) 

Level: (Hen-id) 

- Inputsp: enpty 
Prolog: initialize 

Output spenpty 

Inputs^: enptu 
General: enpty 
Output^enpty 

!npttts E »**pty 
Epilog: enpty 

Output Sfd I TEffBIMCI 
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Loop 2 (inner loop) 

Level: (item- id, store-id) 
lnputsp:e»pty 
Prolog: empty 

Outputspempty 

Inputs^: IDEflANDJ 
General: calculate sua 
Outputscempty 

Inputs^: empty 
Epilog: empty 

Output 3£ empty 

The FRACTION computation al$o has two nested toopj: 

Loop I (outer loop) 

Level: (i tern- id) 

Inputsp: lOErWOt 
Prolog: empty 

Outputspempty 

Inputs^: empty 
General : empty 
Outputspempty 

Inputs E : empty 
Epilog: empty 
Outputs £ empty 

Loop 2 (inner kwp ) 

Level: litem-id, store-id) 
lnputsp: empty 
Prolog: empty 

Outputspempty 

lnputs c : jQErttNDI 
General: do division 
OutputSciiFBACTiutH 

Inputs^ empty 
Epilog: empty 

Outputspempty 

Clearly these computations are level compatible since they have identical level structures. But the 



(item-Id) Irvct bap of tbe fint 



•rd* 



#t^» 



j^yitg^ttfrr ■y,,. i «. ■■■■nit 



1 
Tfce basis for aft 
rtmd %gfgre Mis km* Tatrik 
is observed ncxty. That is. 
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Computations whose totally nested loops are level compatible and satisfy the above order 
constraints are aggregatabk. 

111.2 Merging Loops 

Because each action and aV I/O must be performed at the same level in the aggregate as it 
was before aggregation, the loop structure of the aggregation of two computations can be obtained 
through a level-by-level merge of the loop levels of the two computations to be aggregated. 

The algorithm for merging two totally nested feops is: 
For each loop in one: 

If the other has no loop at the the same level Just add the representation of that level to the 
description of the aggregate. 

If there is a corresponding loop, the two loops must be merged into one for the aggregate. 

The full details of merging loops are complicated, but a rough sketch follows. Let the 

corresponding loops be L, and L 2 , where no output of L 2 is an Input to L,. ,r There are three 

cases: -..,.. >.. "' ■"-'•■■--■' 

I. Some output F of the Epilog of L, is an input to L 2 . 

a. F is an input to L 2 's Prolog section: aggregation impossible. 

b. F is used by an action in L 2 's General section: move that action to the Epi log of 
the the corresponding level in the aggtegat* along with arty actions in L 2 *s General 
section which use. as input, some output produced by the action; all other actions 
remain in the same sections m the ag gi e ga w arthey were Hi L t and L g . 

c. All other cases: aN other actions remain in the same sections fn the aggregate as they 
were in L| and L 2 . 



17 Obviously, the case where no output of L, is an input to l 2 wrH be handled exactly the same, 
mutatis mutandis. The remain case, where each has some output that is an input to the other, is 
impossible. 
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2. Some output F, generated by *on» action A m the General seaton erf L p is an input to L 2 . 

a F is an input to L 2 's Pr o I og section, move A from the Gener'a* section to the Pro 1 dg 
section of the aggregate, along with any actions in the General section which have, as 
output, something used as input to that computation; aH other actions remain in the 
same sections in the aggregate as they were in L, and Ly. 1 

b. Att other cases: aH actions remain in the same lerttoro in the aggregate as they were 
in L| and L^ 

3. Neither I nor 2: all actions remain in the same sections in the aggregate as they were in L| 

and Lg. ,„...• - ^-H ; i\^n ■- ?■ "-•'-. • 

Basically, what this means is that a General action must move to the Prolog of the 
aggregate »f it must come before some action in that Prolog or if it must come before another 
General action which mou be moved to the Prolog; a General action mutt move to the Epi log if 
it must come after some action in the Epi log or if it must come after another General action 
which must be moved to the Epi log. 

fit J N on Totalh-N eued Loo ps. . 

In this report the treatment of data driven loop implementations is restricted to loop 
structures that are totally nested. Totally nested implementations are not only broadly applicable, 
but generally simple and efficient as weH. In fact they often provide the most efficient and 
expeditious implementations, especially when sequentially organized fifes, sorted by key values, are 
used. For the sake of completeness, though, something should be said her* about non-totally-nested 
loops. Indeed, a great deal could be said about such im pat ra e ntatlore eno ug h, certainly, to make 
one or more separate reports. Because of this the discussion here to necessarily brief and 
incomplete. 

Most importantly, k should be said that non-totatv-neaied loop structures are by no means 
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inefficient or uninteresting. They are used all the time and for food, solid reasons. Their use is 
perhaps most interesting when two or more computations cannot be performed entirely concurrently 
(i.e. in the same loop), but they can be performed with partial concurrency. The following two 
examples illustrate. 

HI .3.1 Example I: Aggregating Computations with Incompatible Order Constraints 
Recall the flow equations: 

ITEMDEMAND IS THE SUM OF DEMAND FOR EACH ITEM-IO 

FRACTION IS DEMAND/ ITEMDEMAND IF DEMAND PRESENT 

AND I TEK3EMAND PRESENT 

and their implementing computations. We saw in Section MI.I.2 that the implementing 

computations for these flow equations could not be merged into a totally nested loop structure 

because the inner loop for the first had to be completed before the inner loop of the second could 

be performed. They can, however, be aggregated into a single bop with a structure like 

for each (it em- id) from DEMAND 

sum » undefined 

for each (store-id) fro» DEMAND (i ten- id) 

<calculate sun> 
end 

if defined I sum! then I TEMDEMAND! i tea- id) - sua 

for each (store- id) fron DEMAND! it en- id) 

calculate FRACTION> 
end 

erui 

This is a non-totally nested loop structure, since two loops (the inner ones) appear at the same level. 
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It is interesting to compare this aggregate implementation with the imaggregated 
implementation of the two computations involved (as separate loops to separate job steps). Oh the 
one hand, in either implementation every rectmiorfhefl^NDfhiWi'nifst be accessed twice, so no 
accesses are eliminated by aggregation. On the other hand, accesses of the records of fhe 
I TEmEMAND flow are eliminated by aggregation. If the computations are implemented separately. 
every record of I TEnOEHAND must be written into a file by the first computation and then read back 
by the second; whereas in the aggregate implementation the records are used as they are generated, 
so no re-reading is necessary. 1 * 

In general we have seen that when two implementations are level-compatible, the only case in 
which their aggregate cannot be implemented as a totally nested loop ta where, for some loop level, 
the output of theEpi log section of one is an input to the Pro tooserton of the other (as is the case 
with I TEfOEftAiC above), in such a case the corresponding loop level of the aggregate can be 
implemented (as above} as two loops of the same level performed W sequence, and re-reads of the 
flow in question will be saved. 

lfl.3.2 Example 2: Aggregating Computations That Are Not Level-Compatible 

In Section Hill we saw that computations with the foll o wi n g K » tl structures were not level 

compatible with one another: 

TNL, (K|. (K.D. (K.L.m 
TNL 2 <KJ, IK.m. OC,L.fU 

The fact that they are not ievet-compatibie means that it is impossible to devise a total 



'* In fact, if these records are not used by any other computation in the data processing system, 
it is not necessary to write them out into a file either. 
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nesting of loops that will implement their aggregate They might, however, be said to be partially 

level-compatible, since the outermost levels have identical keifs. If a common driver set can be 

found for that level, they might be implemented as a non-totaffy-nested loop structure. The 

following is a possible implementation skeleton: 

for each (K) fro* Dq 

for each ttj fron 0, 

for each (M) fro* Z 



end 
end 

for each (fl) fro* D 3 

for each IL) front)* 



end 
end 

end 

where the Dj are distinct drivers. 

This is another commonly found construct in file data processing. It is the case where, for a 

common set of values for the sub-index (K), two or more independent computations are to be 

performed. As in the previous example, there is some I/O saving (over separate implementations 

of the computations involved) because each record of Oq has to be read only once. 
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IV. I A Theory of Index Sets and Critical Index Sets for Data Driven Loops 

Let us begin with some definitions and useful consequences of these definitions. 

IV.I.l Definitions and Useful Lemmas 

We redefine the notions of a flow's index set and critical index set formally and introduce the 

operators Pro j, In j and Restr: 

Definition: The index set of a flow F with index I is defined as 

IS(F) - {1 | there is a record in F for 1} 

Definition: The critical index set of a flow F (with index I) with respect to a flow X is defined as 

ClS x tF) - {I | there is a record in F for 1 

that is necessary to generate some record tit X) 

Definition: The projection of an index set S with index (k t , . . ,k m , k^ it . . , k„l onto the sub- 
index (kj. . . , k m ) is defined as 

Pro j IS, (k,, ..,k m )) - 

{(k|, '. . ,k m ) } 3fK,^,, . . , k„> such that (k,, ..,k m , k,^, . . , k„) < S} 

Definition: The injection of an index set S with index 4k h . . t fc a ) by the index set T with super- 
index (k ( , ., k n , k^,. . ,k R ) is defined as 

in>(S,T>- 

{(k,, ...k m , k^,,..,^) | ■ lh|. - - , k»l « Sa 
fkj, . . .k^, k^i,. . ,k^l c 1} 

Definition: The restriction of an index set S with index <*,, . . , k„J by Hie condition C (whose 

truth depends on the values of the keys k|, ... k„) is defined as 

flestrfS.Ci - {(k,, . . ,k») « S [ C fetme} 

Fronv the last three definitions the following simple but useful results (stated without proof) 

can be obtained: 

Lemma 1: If A is an index set with index I , then 
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Corollary J: Let F be defined as in Theorem I. Then for any flew Fj with index identical to 
that of F . 

CI6 f (F,) - IS(F) 

Theorem 2: If R is a flow (with index I „) described by the application of a reduction operator 

to a flow expression expr m terms of the flows F,, . . , F m where each flow F, has index Ij. (e.g. 

the flow equation for R is: R IS SUri OF expr FOR EACH <l B >),th«n 
CISglFj) - ProjUStexprr.Ij) 

(Note that the index of expr must be a super-index of l|j.) 
This theorem simply says that when a flow (as that described by expr) is reduced every record of 
that flow is used in calculating the result. From Theorem I we have in turn that the critical index 
set of each F f with respect to the flow to be reduced is given by the expression on the right-hand 
side of the above equation. 

Corollary 2: If R is a flow (with index Ir) described by the application of a reduction operator 

to a flow F (e^ R IS SUM OF F FOR EACH <l#), then 
CIS B (F1 - ISfFI 

The following theorems concern the nature of the index sett of flo* expressions. First, a 
simple result about flows described by reduction: 

Theorem 3: If R is a flow (with index I R ) described by the application of a reduction operator 
to a flow expression expr (eg. the flow equation for R is: R IS SUT1 OF expr FOR EACH <1r>X 
then 

IS(R) - Pro j (IS (expr), i„) 
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ISfFjif n * 1 



IS(safe[F l ,..,F B ]) - 

tSG$ «n> I 



As mentioned above the only legal arithmetic flow expression in FE-HIBOL is a safe or a 

safe further qualified by some condition. Thisttirtnlr qaalification must take the form of a logical 

expression ANDed with the safe. Thus, to complete our treatment of arithmetic flow expression we 

only need the following simple theorem: 

Theorem 5: The index set of a simple arithmetic flow expression safe qualified by the 

condition C is given by 

IS(safe AND C) - RestHISCsafel.CI 

Consideration of special cases leads to three simple corollaries: 

Corollary 4: By Lemmas 2 and 5 

IS (safe AND G PRESENT) - In j(G,lS(safe)) 

- IS(safe) n Inj(G.lS(tafe)) 

Corollary 5: 

ISlsafe AND (C, AND C 2 )l . RestrUStsafel.C,) fl Restr(!Sfsafe),C 2 ) 
Corollary 6: 

IS(safe AND (C, OR C 2 )) - Restr(IS(safe),C,) U R«str(iS(safe»,C 2 ) 

For conditional expressions with two cases 1 * we have the following result: 

Theorem $ Let E be a conditional flow expression of two terms: 

E - expr, IF C| 

ELSE expr 2 JF C 2 
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The extension of this theorem to more than two cases is trivial. 
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In level I we have the output R and the driver F. The Index set D, enumerated by this driver at 
this level is 20 

0, - ProjUS(F),(k,)) - IS(R) (by Theorem 3) 
thus satisfying the driving constraint for the input R. 

In level 2 we have the input F and the driver F. The index set D 2 enumerated by this driver 
at this level is 

D 2 - 1S(F) - CIS P (F) (by Corollary 2) 

thus satisfying the driving constraint for the output F. 

Example 2: 

PAY IS HOURS * 3.80 IF HOURS PRESENT AND 

NOT HOURS > 40 

ELSE 128 + (HOURS - 48) * 4.S IF HOURS PRESENT 

We shall use this example to illustrate Theorem 6. Define E| and E 2 by 

E, - HOURS * 3.88 IF HOURS PRESENT AND NOT HOURS > 40 
and 

E 2 - 128 + (HOURS -48) * 4. 5 IF HOURS PRESENT AND 

NOT (HOURS PRESENT 

AND NOT HOURS > 40) 

By pure logical simplification the last equation can be rewritten: 

E 2 - 128 + (HOURS - 48) * 4.5 IF HOURS PRESENT AND 

HOURS,*,**- •. 

From Theorem 6 we have that 



Theorem 8 of the next section provides a formal treatment of enumerated index sets. 
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for each (item-id) from C 

get P( item-id) 

for each (store-id) from C(jtem-id) 

get C( item- id, store-id) 

EP( item-id, store-id) - ... 

if definedfEPCitem-id, store-iWl 
then write EPOtem-id, store- id) 

end 

end 

In level I the input is P and the driver is C. The index set D, enumerated by this driver at 

this level is 

0, = ProjUS(e). < item-id)) 

d IS(P) ifProJ(«rc).ntem-idl) - ClStpfP) 

In level 2 the input is C, the output is EP and the driver is C. The index set D 2 enumerated 

by this driver at this level is ? 

2 - IS(C) 

2 lnj(lS(P),!S(C)) <byLemma3* 
- CISfptC) « IS(EP) 

Thus we see that the flow C is (at least) adequate to drive both levels. 

IV.1.4 Driving Flow Set Sufficiency 

We wish to be abfe to determine whether a set of input flows is sufficient to drive a 
computation loop level Let us begin by defining the notion of the necessary Index set for a 
computation level: 

Definition: The necessary index set at level i for a computation C (denoted NIS^O) is defined as 
the set of index values necessary to drive level i of the tottliy iwud loop implementing C. 
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IV.1.5 Minimal Driving Flow Sets 

The set of all inputs of a computation is sufficient to drive that computation We are 
interested in finding the smallest subsets of this set that wiH provide sufficient drivers for each 
level. This interest stems from our implementation constraint that all drivers must be read 
sequentially and must have compatible sort orders. If aB contained inputs were used to drive each 
level of a computation loop, all inputs to that computation would have to have compatible sort 
orders and all would have to be read sequentially, a constraint that is often unnecessarily severe. 

Moreover, from an efficiency point of view, we generally want the set of indices enumerated 
by the drivers at any level to be as small as possible (while satisfying the fundamental driving 
constraints) so as to minimize the number of iterations. For example, if we are trying to minimize 
IfO accesses and we have a loop that reads some (non-driving) flow by random access, the fewer 
iterations there are the fewer attempts there will be to access records from that flow. 

Consider, for example, the EP computation (Example 3 above). The inputs contained in the 
outer loop are P and C. Both together could have been used as a driving flow set for that level. 
We were able to show, however, that C atone was sufficient to drive the outer loop. Thus, we came 
up with an implementation in which only the flow C had to be sorted and read sequentially. 
Additionally, in this implementation only those records of P that can actually be used are fetched. 

It is important to note that the using some smallest driving flow set for each level does not 
always improve efficiency. In the computation above it can be shown that P alone is sufficient to 
drive the outer loop. However, such an implementation would be no better than one in which the 
outer loop is driven by both inputs. Since the inner loop must be driven by C in any case, we 
would still end up using both inputs as drivers; both would have to be sorted compatibly and read 
sequentially; and more records of P would be read than would actually be used. 
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A d B - B ctaf - A thw 

The expression on the right of the equivalence symbol^!* a formula in the first order 
predicate calculus. If this formula can be shown to be t tautology the corresponding set inclusion is 
proved. Showing that a formula is a tautology is eouivtknt to snowing that It simplifies to f. 
Since powerful first order predicate cakulus simphfiers exist, fhe*ask of proving set inclusion can 
be solved by recasting the hypothesis as a predicate cakulus formula and trying to simplify it. If it 
can be simplified to T inclusion is proved; if it simplifies to F inclusion is disproved. 

When the formula cannot be simplified to either T or F, the meaning Of the result is not 
clear. Either the simplification is correct (in whkh ca* the formula is not a tautology, and thus set 
inclusion does not hold) or the simplifier has run up against a fundamental limitation 2 * and has 
failed to simplfy the formula completely. In the tatter cast the formula may in fact be equivalent to 
T (implying set inclusion), but the simplifier is unable to determine it Because of this ambiguity, 
the wisest assumption is the conservative one: whenever simplification to T does not occur, set 
inclusion does not hold. 

IV.2.1 Characteristic Functions for Index Sets 

In this section the particulars of the syntax 25 and semantic* of characteristic functions for 
index sets are presented. ; u 

The characteristic function for an index set is a logical expression (predicate) in terms of its 
the keys of its index that is true for an assignment of vaiues to those keys in exactly those cases in 



24 



It is a well-known fact that it is impossible to devise a procedure that will correctly simplify 
every formula in the first order predkate calculus 

5 Because our work is implemented in the LISP programming language the notation is 
unabashedly LISPish. 
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which the index set contains a corresponding index valor That is, if S^ Ik, . . . . kj denotes the 
characteristic function for the index MS then 

Sc*»<*i. •-. • ♦ M • T * S ouatams an index nine with k, - 1,. .... k„ 1 4, 
The tagicat operators from which ctwac twitf k functions are fanned are 
I Standard tepcal operators 3 * 



a. AND <**)*,,. ..,*„}»? for a partfedar key-tuple mstwx* iff a* of the p, are 
tree for that instance 

b. OR fOR p,, . . , p.) - T for a particular kejr-tupie instance iff any of the p« arc 
true for that 



c Wr «DT p» « I for a partictftar xe^hnde instance iff p is false for that 
instance 

d FOR-SGME fFOR-SOrt (k, k.) pfk, y^..,^ - I for a 

particular k e y tnpfc nwtancr Ik^,. ... k,| iff there exist values for die tecs k,, . . . k. 
such that the predicate pfk,, ... k.» is true, tins is existential ouaotifkatioe. 

2. Stamlaid artthmetfc comparison epeialors 

in terms of variables ^infoa* and constants fanned osi^ tt« arfthm^ o|)erators ♦. -, 

*and /) 

a. EQUAL (EQUAL expr| expr 2 ) - T iff e*pr t and e*pr 2 have the same numerical 
value 



b GREATERP JGREATERP e«pr, e*|ir 2 l - T iff the innmiiiii value of ,e^»r, is 
, greater than that of *>>ur? 

3. The special operator DEFINED} <0EFI«ED IV per k,, . . , k.11 - T iff there is a record 
in the variable V in period per for the key-tuple instance Ik,, . - , kj. The argument to a 
OEFJI*OorjerafcwnnHlbea«rahki 



The terms mtreducerf here a«e «p1rir«ed to |^«ter d^a tt in rl« folowh^ sedioro 



26 The symbols p and p* denote predicates. 
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IV.2.1.1 Variables 



A variable is a representation of a H1BOL flow with key and period information attached. 
The period uniquely identifies the variable in time (i.e. it specifies a particular "incarnationf «lsthe 
flow). An assignment of values to a variable's index and #£ Period specifies an tnstant* of that 
variable and this instance is said to be defined if there is a datum (and thus record) corresponding 
to the key and period values narned in the assignment. 

The general form for a variable is 

( f I ow-name per i od key , ...key,) 

where flow-name is the name of the associated flow 27 , the slot period contains the name of the 

period in which the variable is generated or input, and the slots keyj contain the names of the keys 

of the variable. An exampte of a variable specification is 

(ENROLt-CO term student subject -number ) 

where 

ENROLLED is the name of the variable 

term is the name of a period 

student and sub ject -number ace the names of the variable's keys 

An occurrence of a variable in a predicate is tailed^ iwi«W« referent*. In a variable 

reference the form in the period slot identifies a particular incarnation of the variable (eg. if the 

period slot contains TERM that means that this term's incarnation of the variable is being referred 

to; if it contains (PLUS TERM -1. ), last term's incarnation is referred to). 



27 The variable and the flow have the same name. 
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1V.2.I.2 (DEFINED variable-reference) 

This expression is true if and only ff variable-reference is defined. In particular an 
expression like 

(DEFINED (ENROLLED tern student subject-nuufaerH 
is true for an assignment of a»nstant values to each of its keys and Us period if and only if the 
variable ENROLLED in the specified period contains a record corresponding to the specified index 
value; otherwise it is false. Thus, for example, the predicate above fa true for subject -number - 
33 and term = TERM if and only if in this term's incarnation of ENROLLED there is a record for the 
index value <JOE 33) (i e . rf and only if Joe is enrolled in subject •% during the current term* 

IV.21.3 Corr espondence Between Log tea I and Set Theoretic Notations 

In our characteristic function/index set duality the general cor lesf wjn d ence between logical 
and set operators is given by. 

logical operator set operator 

and ■■ •» -J-ir' ■•■ •• •• 

OR -ft 

(FOR-SONE (k^r.-.V SeW* Projtt.fk,. ,kj» 
(AND S^ C) « Restr(S.C) 

<*» Sd~ W *» tnjtS.Tl 

(DEFINED (V ...)) « IS(V) 



That is: 



the characteristic function of the intersection of two sets is die logical AND of their 
characteristic functions; 

the characteristic function of the union of two sets is the logical OR of their characteristic 
functions; 

the characteristic function of the projection Pro j (S, I* ) of an index set S onto the sub-index 
I ' is the FOR-SOHE operator applied to the characteristic function «f S and the remaining 

keys; " ' *■" ■' *"■' * - • - > 
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the characteristic function of the restriction Re«tf (S,Cfc^ atwjhdex set Sby the condition i C 
is the logical AW of the characteristic function of S and the condition C; 

the characteristic function of the injection Inj(S.T) of an index set S by the index set T is 
the logical AND of their characteristic functions; 

the cha/acterjstic function of tfmjndex set I5(¥J of a variable V Is the DEFINED operator 
applied to that variable. 

This mapping can be used to determine the characteristic function of any set expression 

encountered above. 

Examples: 

The index set 

IS(P) 

has the characteristic function 

(DEFINED (P DAY He«- id)) 

The index set 

IS(P) n Proj(IS(C),(ite«i-id)) 

has the characteristic function 

(AND (DEFINED (P DAY itew-ld)! 

(FOR-SOriE (store-idl (OEFINEO (C OWT it««-id itore-rd)))) 

The index set «.-<: 

Restr(ISfHOURS), MOT HOURS > 401 

has the characteristic function 

(AND (OEFINEO (HOURS WEEK eaployeo- h*)> 

(NOT (GREATERP (HOURS UEEK nploym-id) 48))) 
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IV 22 Back Sutoitutieo of Characteristic Functions 

We would Kke our characteristic functions to contain as much information as possible 
so as to be able to determine as much as possible about tfte hKlOMon properties of index sets. 

The only possible characteristic function for a vafMfe ft per k,, . :,%) thai is a system 
input (i.e. a variable whose flow is net ctmfmttd by the system; for example a supplier list) is the 
friwa/ one ITJEFIhED CV per k,, , k.J), because al that can be said is that it contains a record 
iff K contains a record. 

In some cases an input variable may have the special property that it wiR always contain a 
record for every allowable index value. (Knowledge of such a property cannot be deduced from 
the HIBOL specification of a data processing system; it must be supplied separately.) Such a 
variable is termed denst or full. An example might be the PRICE variable, which in every 
incarnation should have a record for every possible value of the index I i ten-id). In such a case 
the characteristic function of such a variable is simply T. 

We could use the trivial characteristic function for a computed variable as weH, but more 
(useful) information can be obtained through the application of Tbeorom 3-6 to the defining 
HIBOL flow equation. Likewise, we can use Theorem* I and 2 to obtain useful characteristic 
functions for critical index sets. Characteristic functions thus obtained are called one-step 
characteristic functions. 

It should be easy to see that for any characteristic function if an occurrence of (DEFINED 
variable) is replaced by the characteristic fanctton for variable, the resut wif be a logkany 
equivalent characteristic function. This is termed back- substitution of characteristic functions. If 
back-substitution is applied recursively, the result w»B be a characteristic function containing only 
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DEFINED's whose arguments are non-computed variables. This is called total back-substitution. 
Total back-substitution of all characteristic functions has the advantage of making them all into a 
uniform form, thus facilitating comparison and logical manipulation. 

IV.2.3 Example 

Consider the flow equations: 

S IS H * R IF H PRESENT ANO R PRESENT 

X IS (H - 48) * R / 2 IF H PRESENT ANO R PRESENT AND H > 48 

P IS S + X IF S PRESENT AND X PRESENT 

ELSE S IF S PRESENT 

ELSE X IF X PRESENT 

where the flows H and R are system inputs, all flow have the index (key) and all computations are 

performed daily. The one-step characteristic functions of the necessary input sets are: 28 

N lS<S) el- , - (ANO (DEFINED (H DAY key)) 
(DEFINED (R DAY key))) 

NISIX),^ - (AND (DEFINED (H DAY key)) 
(DEFINED (R DAY key)) 
(GREATERP (H OAY key) 48)) 

NlStP),^ - (ORJOEFINEO (S DAY key)) 

(DEFINED (X OAY key) J) 

From these we deduce (by Theorem 9) the following results 

I. Computation Scan be driven by either M or R, since both 



2S We use the outputs as the computation names and drop the level subscript since there is only 



one level. 
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NISIS)^ -» (DEFINED « MY key)) (U) 

and 

NISIS)^ -• (DEFINED (R DAY hey)) (lb) 

are true 

2. Confutation X can be driven by either H or R, since both 

N!S(X) chw -> (DEFINED (H DAY key)) (2a) 

and 

NlSfX)^ - (DEFINED (R DAY key)) (2b) 

are true 

3. Computation P must be driven by both S and X, since neither 

NISCP)^ -♦ (DEFINED IS DAY key)) (la) 



NIS(P) eMf -> (DEFINED (X DAY key)) (lb) 

are true, but 

H JS(P) ctar ■* (OR (DEFINED (S DAY key)) t*4 

(DEFINED (X OAY key)) 
is true * 

However, we know that 

lS(S) ttar . (AfODEFINED (H DAY key)) 

(DEFINED (R DAY *eym 

IS(X) clwr . (ANDIDEFINED «M*Y key!) 

(DEFINED (R DAY key)) 
(QREATERP (H DAY key) At)) 

so back-substitution of characteristic functions yields 
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NIS(P) €hw - (OR (DEFINED (S DAY key*) 
{DEFINED (X DAY key))) 

- (OR (AND (DEFINED (H OAY key)) 

(DEFINED m 0»*»yili 
(AND (DEFINED (H OAY key)) 
(DEFINES M (JAY keyH 
(GREATERP (H DAY key) 48) ) ) 

- (ANO (DEFINEO (H OAY key)) 

(DEFINED (R OAY MyHfr 

Thus, formula (3-a) 

N'S(P> ehw -» (DEFINEO <S OAY key)) 
becomes 

(AND (DEFINED (H OAY key)) (DEFINEO « BAY keyi)) 

-♦ 

(ANO (DEFINEO (H DAY key)) (DEFINEO (R OAY key*)) 
which is obviously true. Thus, back -substitution has revealed that computation P can be driven by 
S alone. 
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extract the data item (the number of hoars worked) 

multiply it by 3.00, 

assemble the corresponding record of, PAY 

whose emp I oyee- i d key is (he same as the record read 

whose tlataitemY value is the result of multiplying the value of the data item of 
the record read by 3.00 ,, 

write the newly created record to the file PAY 

To support this iteration, there must be 

declarations of the data objects to be used 

loop initialization 

EOF (end-of-file) checking (to terminate the loop) 

V.M.I Necessary Data Objects and Their Declaration 

First there mustfw deration! for all inpfif ancf output files. Assume that the files PAY and 

HOURS are known by these names to the PL/I environment (JCL code can be generated to make 

this happen). Then the following declarations must appear in the PL/I code: 

DECLARE HOURS INPUT FILE SEQUENTIAL REqDRO, 
PAY OUTPUT THE SEQUENTIAL l&ORDj 

There must also be declarations for data structures "ancillary to the I/O and control to be 

performed. In particular, for every input file there must be a record image data structure into 

which a record of that input can be read. Likewise, for every output file there must be a record 

image data structure into which a record of that output can be built so that it can be written out 

In our simple example, the HOURS and PAY files must have such associated data objects. The PL/I 

structure can be used for this purpose: 



TO Data Driven Loops 

DECLARE 1 PAY_RECORD. 

2 EflPLOYEE FIXED OEClfW. 141. 
2 PAY FIXED OECltML Ml. 
I HOURS .RECORD. 

2 EMPLOYEE FIXED DEClrtAL fol. 
2 HOURS FIXED DECIMAL «>: 

Finally, for each input a flag is needed to indicate the EO£ conditiop ^r tba4 input Thus, for the 

HOURS file we would have the declaration: 

DECLARE 1 EOF ALIGNEO. C 

2 HOURS BIT (1) UNALIGNED INITIAL t'**Bh 

When EOF occurs on the associated Trie this flag is set Ni."'^'^. 

V.Lhg Loop Initialization 

Before iteration aH flags must be inktalcted. This can be done by the use of the INITIAL 
statement in the declaration (as above for EOF. HOURS). Abo aH drivers must be read to establish 
initial values for their indices. In our example, the inttialiiatian section would cortsU of merely: 

READ FILE (HOURS) INTO (HOURS JCCTJRDI ; . 

V.U.3 EOF Checking and Loop Termination 

To detect an EOF condition on a nie and set its <;prropQB<Mnf flag the PL/I ON construct 
can be used. For the HOURS file the appropriate code would be 

ON ENDFILE (HOURSI EOF.HDURS - TB; 

To enforce iteration termination upon EOF of the driver, the loop is constructed using the 
formOO UHILE <- EOF.driver). 
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V .1.1.4 The Loop Itself 

Given this supporting structure, the r<« of the implementation is eauy. The loop itself can be 
written simply as: 

00 WHILE (- EOF. HOURS!? 

PAY.RECORO.PAY - H0URS_REC0R0.HQUR& * &9t , 

PAY_REC0R0.EnPL0YEE - HOURSJRfCOfiR^PLOyiEt 
WRITE FILE (PAY} FROM <PAY_RECORD) s 

READ FILE (HOURS) BtfO HOSSJIEGgRBlr V 

END ; 

When the loop terminates, the job step is ended and the input and output files are automatically 

., >'l ' T , . i "• '• '■ ■ 

closed. The complete PL/I program for the pay calculation computation is given in Fig. L 

V.1.2 Uniform-Index Matching Computations '.„ .a hun -a 

Let us extend our treatment of so^le-level toi*^ one 

input. We use as our vehicle the variation of the p»» calculation tltiH includes a rate fhefmdexed 
by employee-id): . y?::r .■■- ;<-r$- rti .. : .y; i^rrn- .' V \ :v* v 

PAY IS RATE * HOURS IF RATE PRESENT AND HOURS PRESENT 
Suppose that the input files RATE and HOURS are to be read sequentially, that their records are 
sorted by emp I oyee- i d and that HOURS is used as the loop driver. 

Again because the loop is driven by a single input file, it is implemented using the form DO 
WHILE (-* EOF. driver). However, the computation description dictates that a record of the 
output file PAY for a given value of the key employee- id is to be produced if and only if there is a 
record for that employee in HOURS and there is a corresponding record in the RATE file. Therefore, 
in the body of the loop, before the output record can be calculated, the record (if any) of the non- 
driving input that matches the current value of the driver's index must be found. 
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To find the matching record of the non-driviflg input we read succeulve records from its fHe 
comparing the index value of each recprd wUh the current toop Jodex. The general matching 
algorithm consists of <he following loop: 

For each non-driving input: 

1. If FOUND, input is, true (indicating that the record currently held in the input's image 
structure has been used) read the next record of the input 

2. If an EOF condition has occurred on the input, set FOUND, input to false (0) and exit the 
ioo P- ,., -i^-v ■-. ■■-■ .•■<• -^- - : --:■•-■•.■'"" 

3. Otherwise, chjck the inde«,c^ 7 tte 
driver record: 

if -, set FOUND, input to true and exit. 

If <, read the next record of the input and go to step 2. 



If >, there is no corresponding record in the input. Set FOUND, input to false (in case 
the index of the re^jr^iist read may mate* Uu£ of sww subsequent driver retort*) 
and exit 



To support this algorithm a flag FOUND, input must be dectared fof each, non^drivrng input 
and initialized to true (I) before the main teop. < ,. ^ , «: 

The implementation of the rest of the main loop's body (following the matching code) 
consists of code that attempts to compute the output record using only those non-driving inputs 
whose FOUNO flogs are trueT Basically, in this code, the PRESENT checks of the HIBOL description 
become checks on the corresponding FOUNO flags. 

This matching process must be implemented for every non-driving input in a data driven 
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PAY_COMP: PROCEDURE; 

(declarations) 

ON ENOTRE <RATE) EOF. RATE « '1'*; 
ON ENDF11E (HOURS) EOF .HOURS « TB; 

W*D FI1E ^*AT€) INTO (RATE .RECORD); 
LEVEL_1_MININUH. EMPLOYEE - RATE .RECORD EWlOYfE; 

0© WHILE { EOF RATE); 

JF EOF. HOURS £ s5 

THE* 00:/* THIS READS ITEMS, SEQUENTIALLY, FROM A FILE UfTR THE lE^ESTEO 

RECORD IS FOUND {SET FLA6S TO TRUE) OR PASSED (SET fLAfiS^IO |AU£). •/ 
IF fOUNO.MOURS.RECORO .,.,.. , s , : , .; ., ."•- i ■ 

THEN READ FILE (HOURS) WTO f*KWRSJ«ECO«i|; 

TOURS.ftECORDjCOMPRRE : 

If -EOF .-HOURS '„,„,,-,,. ,' - ; 

THEN FOUND HOURS .RECORD * "i'D; fi , , 

ELSE IF TOURS.RECORO^EIN^OYEE ? L{YEL4^JN^4MPL0*EE 

THEN FOUW.HOURS^RECORD » 'l't; , .„. . v ., r , jH .;, )! .,; 3 _, ; , 
ELSE IF HOURSJKC^CMKO/jJeE j| 1 i»|^|iJfIJ||l|fi f *^.OYEE 

THEN FOUND .HOURS JGCO*i .«,'• M :. ■ 
EISE^O S) IRTO (HOURS.RECORO); 

GO '.TO. WWS_R£CIPBJC0HW«E; 
'"'.'" END; 
ERO; 

IF FOUHO. HOURS THEN DO; PAYJJEC«!».PAY 

PAY.RECOW. EMPLOYEE - L£VEL.l>INIMUM.EMPtOYEE; 
WRITE FILE (PAY) FROM (PAYJttCOMI); 

end; ' "^^"^ ■---• ■ ■'■■■ ■'■•■•' 

read file (rate) into (rate .record); ^ 

level.l.minimum.employee « rate .record. employee", 

END; 

END PAY_COMP; 



Figure 2: PL/I code for PAY IS RATE * HOURS 
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First, notice that the iteration structure is fundamentally different front that for a single 
driver loop. The index value determination and |JQF checking « now performed at the beginning 
of the loop body 31 As always, the Meration is terminated when all drivers are exhausted (when the 
nag EOF^SCLFAfi ends up tr«e*fter ,aM drivers have been read). Thus the loop exit must appear 
before the output ca^b^ions, a od the loon. OftitMILaV ,1*1*^ «*wd mated %f>or**HtE 1- 
EOF. dr i ver) (as in the single driytt . fa »e). This is Just » minor vwlat ion on the basic scheme. 

What is interesting in the implementation of Fig. 3 is the use of the PL/I ACTIVE structure 
and the ACT1VE_DRIVER_C0UNT variable in determining the proper next index value. The idea is 
to look through the drivers in succession. The first is used to establish a tentative index value for 
the current iteration. The first driver is also given a number that marks it active (for the time 
being). If the next driver has the same index value it is given the same number, indicating that it 
will be a«tye whoa the^rst w&frkm* lower haiax value the loop Index ft reset and the second 
driver is assigned a higher number, meaning that it is tentatively active (and, effectively, that the 
first is inactive). When all drivers have been examined, those sharingf<he1rtghett ACTIVE number 
(held in ACTJVEJDRI^JSll^a.re.a^ 

MultipkMevel loops introduce the need for r miBXnan ce of current index values for each 
distinct loop level and for qpntrpl structures to impkwent Je^ dfivmg from loops at lower teveb 
Multiple-level loops arise from two basic touic«; reduction CMi^tiUtions artd rnixed-index 
matching computations ., Let us examw the impk«»««»Jtion of each in turn. 



31 It could be done at the end of the body if the same code were duplicated as an initialization 
before the loop were entered. We have refrained from doing this to minimize code. 
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I TEMOEMARD JCOHP : PROCEDURE ; 
(declarations) 

or erofile (wlnaw) Eorwmw . .*!•■;. ■ 

READ FILE (*emaro) into (KMA«o_fttcoRD); 

THEN 00; LtVC4._2_HWMWM.1TfN . DEHARDJttCORO.JTEM; 

uyits_i^«p^j8Hmn>H. |pn yf untijt&mmmkmmi 

ERO; 
ELSE LEVEL_1 * '•'«; 

90imii UVtfjL^; , -■■•-■■■,■■: b,-.- ;' " ' 

OEFTRED. I TEMOEMARD « •■•!; 

tdo wine (ttVci.zV; 

THE* ITEMDEMAROJttCORD.I TEMOEMARD « 1TENDENMB.REC0RD.ITENKNAMI ♦ BEMANO.RECORD. DEMAND; 
ELSE 00; I TEHDEMAWL^OM, I T6MWWW ,- 
OEFWEO.ITEMKMftRD > 'l't; 
END; 

R£*0 FILE (DEMAND) WTO (KMAROJIECORO); 

IF EOF.OEMARO ,, v ,...< . : ,:y.v. <• ---' ;> t ..-..'<. >rf" - '■:-:■ 

THER DO; IEVEL_2_MWINUM.1TEN » OEHAROJtECORO.HEH; 

I* i*WUf^PHPH. ITEM> LE*Li„»_WRM_«_lll*r*«». ITEM 

--•.;»,- ' T«Ei L€fPtH3-l«-|f i; 
END; 
JEISE ^p; IE«L_? « '0Ml; , . ■ , ; |: „ . . ,- V yy- } 

LEVEL.l - ••«-■; 

-• - '°*V "... .;;i: :-.■■■;- • ..,-:;.;:: 

END; 

ITEMDEMAJID_RECOR0. ITEM - LEVtt_l_MWlMUM. ITEM; 
WRITE FILE (ITEMDEMANB) FROM { ITEHDEMMD.RECOW); 

IF iOt> DEMAND -■-,„. 

THER LEVELS_1_THRU_2_MWIHUH.ITEH • LEVtL_2_MWrMIM.ITEM; 

• ■ 

ERO; 

ERO ITEMOEHARD.COMP; 

Figure 4: PL/I code for ITErDEMANO IS THE SUM OF DEMAND FOR EACH ITEM- 10 
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a. one is found that has an item-id value matching the driver's ite*-id value, in which 
case all EXTENDE0PR1CE records for that value can be generated; or 

b. one is found that has an item- id vahie grtlter!th^-'tlR > t^ifrVii^''^im]CE , file is 
exhausted, hi which case there is no matching vatae aitd the inner loop can be skipped. 

2. tinner loop) Generate all ©utptit records fot BWglvfeirHew-id value, reading records from the 
driver as you go. When a driver record is read that has an rten-id value greater than that of the 
current PRICE record, or the driving file is exhausted, exit. 

3. If neither input file is exhausted go to step 1 and repeat; otherwise extt. 



In this way *acb record of the PRICE file is read only once 32 

A PL/I implementation of this algorithm is shown In llg^^fie mder will notice that this 
implementation is unnecessarily inefficient because when a matthmgl'RICE record is not found the 
inner loop is executed anyway. This is dt«e te Wustr«e what happens in the general case where 
there may be calculations in the inner loop that cah still be performed without the use of a missing 
input. 

• ■■■ -■•, ■ v '« ■'.■ if. 

V.I Aggregated Computations 

The aggregation of two or more computations into one nested loop introduces a consideration 
not seen before: the synchronization oT computaf tern at dfrYeVe^lobp level*. Consider the two 
HI BOL computations: * 



j, J y t > ,# jfc " 



EXTENOEDPRICE IS PRICE * CURRENTOROER IF PRICE PRESENT 

ANO CURRENTOROER PRESENT 

VALUESHIPPED IS PRICE * ITEMOEMAND IF PRICE PRESENf r ! " ' 

ANO I TErfEIIAhtr" PRESENT 



32 If CURRENTOROER had been unsorted or sorted differently, records from PRICE would 
generally be read more than once. 
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where CIWRENTOROER is the same as above <with index (i ten- id, a tore- id)) and IT0CEMAW3 is 
a file with index ( i tew^i d). As w: have seen above, the ttm oenyutatton can be imp lem e n ted as 
a two-level nested toqp. The second pooipulat ion it«raw over the sWgte key ite»- id and so has 
only one level. 

When aggregated the rewltij a two4evefcloo|r w 

Level 4 <iteo-idl 

r Jrjp^t^^l^, t JJPIP!*il|l=,,--v., 
Prolog: calculate value-shipped 

0utpUt9p««pty \!«ir-:y ,,-y.-: '■ :'fM-.- 

EpHog: enpty 

Loop 2 <inner loop) 

...,., . ^ye^r^t^j^ %tojp-^i,, : :-i : -v t;; ^v^-.f-r r-- : -i^is . .v,-:J .*«*• • ' H "'?. ' 
Input8p: CURRENTORDEHI 
.. ,. J^oIobj - --,5 calculate i#**ende*Vpr ice ■ -^., .,,,,. > -••-•*•.• 

OutputspHEXTENOETJPRiCEl 

1 nput Sj: enpty 
... Epilog H , >; ' eepty , r -- : ;..-p :.-. * ,^w- -,.■>;?- v- .v.-;-i;r v . • u '"'"' " 
OutputsceMpty 

What is significant here is that the computations in the aggregate occur in different levels. 

Suppose that the PRICE file is guaranteed to have a record for every 1 ten-id. Then ITEMDEflANO 

is the natural choice for a driver for the value-shipped computation because a recordlctf the cwtput 

will be generated if and only if there U a record in nEt«f1AND for the Same tey. Ax for the 

extended-price computation, £Up£NT|)fBgE£ is tMoo^ possible <hoke for the driver. 

Now the outer loop iterates over J|«sH^ vahfes determined by both drivers. Suppose the 

first record of each djrtyer is read. There are three cases, distinguished by the relative valuerof the 

i tern- i d keys in these records: 



33 Notice that in finalized loop description there is no General section. 
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(declaration*) 

(OR condition*) 

(read CURRERTORDER end Initial 1ze L£vtL_2 JWRIlWn. ITt* - CRRREimSERJtECORD.ITEM; ) 

(read ITEfRJEMARQ an J Initialize LEVEL_1_H1R1MM.1TEH ■ mtWHMRJKCORB-ITfH;) 

(coda to set the synchronization flag for each level to false If Us driver hod no records) 
(comparison e* ITEM v«l««s to i«t syKnranUatlon f left: * 
IF LEVEL.2JHJRIMM.JTEH > Um.lJIIRiflUR.ITEA 

LEVEL_2 - 'o'l: 

LEmS_t_TWWJMtoHWH;UEH "* lEfa.IJIitlWH.ITEf*; 
EM; 

else imtmi_2jmmm.ttm< ««i_i_i»iw«w.ite*i 

THER 00; 00_ltVEl_l «'•■*; 

..•{••••£. - ; - . .--y&fieLji.wwz! • ' '"'-" *■ 

LEmS.l.TW^.HIRIHBH.lTEH • U«l_2_ll»IIWI.I1«l; 
ERO; ■■'.<.•..''-." 

ELSE 90i 90J.tm.Jl - •!*•; 
iL€VEL_2 * M'l; 

iEVEi5_i_Tmru_2_jiMMM.ttill* c ilf^iiLfOwMifr.i^: 

ERD; ) ..♦•'•mjA t^?.- 1 '- ' 

00 WHILE {LEVEL.H; fi j xH* i 

(rood PRICE record) . U J *'-- :.^" ■'." ?* '■ : -"' "->*' '-• 

If DO_LEVEL_l TKER (calculate value-shlo***) /*'*r»ie_i ; i*rkll i ^/; hJi '' ?i 

POWHILE (LEVEL _2); 

IF FOURO.FRICE .RECORD TWER (calculate and write extended-price) 
v (rjfedd CBMlRTOROCRroad M»t LEWL^RIRIIaWriniJ i Q*tt»T0ROtt_tEeOR0.ITM;) 
(check for oof) 
IF LEVEl_.2_rllRlHUH.mH >^|EiittJ_fiMt2jni§^^ 

ELSE LfVEL.2 • 'l't; 
ERD /* LEVEL _2 '*/; 

IF DOJ.EV£L_l TRER DO /* Eolleo LEVEL_I »/; 

■ i.f-iJf Mm*9Jmnmm& ««*(or1te #«leoisHRpW record) 
(read 1TEH0EHMR and reset 

>:.- ttlfftj WRWBH.ITW- l lUR)OiW _>KC01DlTEH;) 
ERD /* Eollof tEVEL.I •/; 

(synchronization code exactly as above) 

EHO /• LEVEL_1 •/; 

Figure 6. Illustration of synchronization code for aggregated computations 
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PAYjCOnP: PROCEDURE* 



DEaARE DSAG1 INPUT FILE SEQUENTIAL RECORD, 
PAY OUTPUT FILE SEQUENTIAL REfJQROj 
DECLARE 1 PAY.RECORO, 

2 Etf*0¥EE FfKED OEClrJAL (*). 
2 PAY FIXED DECIMAL (4), 
1 DSAG1 .RECORD. 

2 EMPLOYEE FIXED DECIMAL (41, 
2 DEFINED ALIGNED, 

3 HOURS SIT II). 
3 OVERTIME WT U), 
2 HOURS FTXEDDECtnAL <3); 
2 OVERTIME FiXEO DECIMAL (31 j 
2 EMPLOYEE FIXED DECIMAL (41. 
2 HOURS FIXED DECIMAL Oh 
DEaARE 1 EOF ALIGNED. 

2 DSAG1 BIT 11) DNALIGNEO INITIAL CfB); 

ON€NDFILE IDSAGl) EDF.OSAG1 - 'l'B? 

flEAD^ILE (OSAGi) INTO 1DSACl_RECORDh 

00 UHILE (- EOF.DSAGlh 

IF dSAGf.OEFINEO.HOURS 
THEN 00; 

PAY.RECORO.PAY - DSAGl_RECORD.HQURS * 3.0; 

PA Y.RECORO- EMPLOYEE - OSAG>JCCQ^@fllWaEt 

WRITE FILE (PAY) FROM (PAY JJECOBBJ* 

READ FILE (DSAG1) INTO (DSAG1 .RECORD) j 

END-. 
Q-SE? • 

READ FILE (OSAGI) INTO (OSAGlJEEOROh 

END j 

END PAY.COMP; 

Figure 7: PL/I code for PAY IS HOURS * 3.80 with Aggregated Flow 
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used. If the sort orders are compatible the method of access is completely analogous to sequential 
access except that "records" are "read" from the table instead of secondary storage (see Fig. 8). 

If the input file is "randomly" organized (regional (2)Nne access code generates a hash index 
and then mimks the PL/I access procedure: compare the key values 6T the indicated table entry 
with the desired ones; if identical stop; otherwise examine successive entries in wrap-around fashion 
until an empty slot is found (end of the bucket) or a complete cycle has been made. If the sort 
orders are not compatible a; more complicated binary search K i mpl emented. 

V.5.3 Random Access 

When the records of an input are directly (regional (2)) organized the file is randomly 
accessed. Instead of using a loop, as with sequential access, a single read, using a calculated key is 
executed. For example, if the PRICE file in the EXTENDEOPRICE computation (above) were 
randomly accessed, the accessing part of the code would be. 

PRICE_RECORO_HASH_VALUE - H00 15 * (H00 (LEVEL 2_HINI«UM. ITEM. )), ) ; 
FRICE J fECORD_H*SH_VALUE_STRIN& - PRIGEJEC^iMSHJf'AtUE; 
PRICE_RECORO_HASH_KEY -= 

LEVEL_2J1INI tlUrl. ITEM || PRICE REC0RO_HASH_VALUE STRING; 
FOUNO.PRICE_RECORD = TB; 
READ FILE (PRICE) INTO (PRICE.RECORO) KET (PRICE_RECORD_HASH_KEY) ; 

The first three statements calculate the source key string which has two parts: the region number 

(rightmost 8 characters) and the comparison key (the remaining characters). The case where the 

record is not present is handled by the statement: 

ON KEY (PRICE) IF ONCODE - 51 THEN FOUND.PRICE_RECORO - 'B'S; 

which resets the FOUND flag if a "keyed record not found" error occurs. 
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IF tof .price 

WE« 00; IF FOOaO.PRICE.RECORO 

the* if mujiuotojme* ■< - Miet_KccoM_sitE 

ma p*ice_reco*o_iwek - wict_«£caffl_iaatx ♦ i; 

£IS£ COF.MKE - '!'*: 

PtlCEJlECOROj.COWAK : 
IF EOF. PR ICE 

ia£«.fOW»«H«_«€C<lB0 • *•'»; 

ELSE IF PRICEJtECQM.ITEM - LCVElS_l_TH*U_Z_m«HWH.ITF.H 

TNCa FOmO.PRICEJtECORO - T; 

EUE IF P*1C£_R£C0J».*TEW > tE«LS_l_T««l_J_mailBa».ITW 



THE* F0VN9. PRICE JtECORQ • 'I'l; 
EtS£ SO; IF FOURO.PRICE.RECORO 

TKR IF PRICEJRECORB.IIIOEX < « PRICEJtECORS.SiZE 
-- 1 - '■■;■: THEa PtICfjjR£COBO_iaOER ■ *."*'; 
PRK£_tECORB_IireCX ♦ 1; 
ELStEOF.PRICE - 'It? 

SO TOPRIC£_RKQWjCaa>«K; 

tao; 



EaB; 



Figure 8: PL/I Code for Reading PRICE bp Cote Table in the Extended Pike Computation 
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V6 The General Case-A Summary 

We have seen that the basic code structure for a computation consists of the following four 
parts: 35 

declarations 

cm-conditions 

loop initialization 

the nested loop 3 * 
The basic structure of the body of each loop in the nested loop is as follows: 

read & match non-driving inputs 

Prolog calculations 

' inner loop (if any) 

Epilog calculations 

write outputs 

read active drivers 

determine new active drivers 

and index values for the next iteration 



loop synchronization code 

exit on EOF or (for inner loop) sub-index change 



It may be interesting to note that ProtoSystem I's code generator generates these sections 
simultaneously as four separate output streams (rather than seouenrtatty)that are catenated together 
when they are atl finished. 

There is no clean-up code following the loop because the end of the job step which is the 
computation does everything necessary, including the closing of files. 
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Appendix I: The Simple Exposition*! Artificial Language (SEAL) 

As an aid to discussing loops we invent an artificial language similar in form to traditional 

high-level languages such as ALGOL, PL/I and FORTRAN. The basic constructs of this 

language are. 

Iteration: expressed by the construct: 

for each <loop-index> froa <driving-f ioM-set> 

<body> ,, 

end 

which has the meaning, perform the actions contained in the <body> for each value of the <loop- 
fnetex> obtained from the flows in the <dr?ving-f lou-sef>. <loop-index> is the either the 
name of the index associated with the flows in the <driving-f lou-set> or (for reasons that 
become evident in this paper) a sub-index of corresponding sub-flows. The set of values that the 
< loop- index > takes on is the union of the index sets of the drivers. This set is enumerated at 
execution time by reading successive records of the drivers. 
I/O and defined: input (record fetching) is expressed by the get operator, thus: 

get <variat>le-'mstance> 
where <var i ab I e- i ns tance> specifies a flow and a particular value for its Index, represented as a 
variable (see below). A statement like this means: fetch the rodkited record if it exists 
Output is expressed by the wr i te operator, similarly: 

Hrite <v3r iable-inst3nce> 
The defined operator is a logical operator for use in conditional expressions. It fa 
applicable only to flow variable instances. The form 

def ined[< variable- ins tance>J 
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evaluates to "true" if the specified record or the indicated flow exists. An particular, If the record is 

an input (obtained through a gejt) »U* "d«f wtd" if *i^^ record is 

an output it is "defined" if and only if the generating code produced a datum for the record. 

Conditional Execution: expressed by the familiar i f-tnan^ti I »« construct: 

if <condi tion> then <state«ient-li8t> f 
else <state»ent-l ist> 2 

which means that if the logical expression <condi tion> evaluates to "true" perform the statements 

in <s t a tement- 1 ist>,; otherwise. perform the statements m <state««nr- Met > 2 . 

Logical expressions can be formed using the arithmetic comparison operators, the defined 

operator, and the logical connectives and. or and not. 

Conditional Expressions: expressed by the construct: 

if <condition> then <expression> t 
else <e*pression> 2 

which evaluates to the value of <e*prees idn>, --if <-•%£ logical expression <condition> evaluates to 

"true" and to the value of <expression> 2 otherwise. 

Variables and Assignment: expressed by the construct: 

<variable> = <expre»sion> 

where - is the assignment operator. 

A variable can be either a scalar or an indexed variable. Flows are represented as indexed 

variables with an index identical to the flow's index. Thus, DEnAMHitete-id, atore-id) is the 

variable corresponding to the DEMAND flow and an instance of its index select* the datum of the 

corresponding ..flow, record. That is. for example, the statement 

0EHANDU234. BG78) - CURRENT0R0ERU234, 5G78) + 

BACK0RDERU234, 5678) 
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means that the datura of the record of OEffAND for item dEM ordered by store «S678 Is to get the 

value obtained by adding the data of the corresponding records rrom CUfinEMtOTWER and 

BACKOROER 

Typically, the record-by~record compojatwn implied by a HIBOL flow equation wouW took 

like that equation translated into our artifKiai bnguage<w*h a gen era t i mr index), such as 

DEMAND Ci tee- id, store-id) - 

if definedlCURRENTOROERIiteM-id. store-id)] 
and defined IBACKOfiOE«(it«i- id, store-id) I 

then CURRENTORDERii tee- id, store- Id) + 
BAOCORDERntes-id, 9 tore- id) 

etse if defirtedtCURRENTOROERfitefr-id, store-id)} 

then CURRENTORDERUtee-fd, store- id) 

else if ctefinedtBAqCQROER Cite*- id, store- id)) 

then BACKORDERUtee-ld, store- id) 

else undefined 

and would appear somewhere in the body of loop. 

Sub-flows: A sub-flow (for use in the for each construct) is expressed by: 

<f lou-variable>(<sub-index>) 

For example, 

CURRENTORDER «i te«-?d) 

denotes the sub-flow of CURREIfTQROER consisting of just those records whose indices correspond to 

the value of the sub-index (item-id). Generally, the value Of the indicated jub-mdex is fixed by an 

enclosing loop. 
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